Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

نویسندگان

  • Siti Sakira Kamaruddin
  • Abdul Razak Hamdan
  • Azuraliza Abu Bakar
  • Fauzias Mat Nor
چکیده

The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier detection in financial statements: a text mining method

This paper presents a text mining methodology to extract outlying knowledge from a collection of financial statements. The main idea is to extract relevant financial performance indicators and discover implicit textual description of the indicators. The extracted information was represented using a network language i.e. conceptual graph. Outlier mining was performed on the conceptual graph repr...

متن کامل

EFFICIENT DESIGN OF EMBEDDED SIGNAL PROCESSING SYSTEMS USING TOPOLOGICAL PATTERNS BASED DATAFLOW GRAPH REPRESENTATIONS by

Tools for designing signal processing systems with their semantic foundation in dataflow modeling often use high-level graphical user interfaces (GUIs) or text based languages that allow specifying applications as directed graphs. Such graphical representations serve as an initial reference point for further analysis and optimizations that lead to platform-specific implementations. For large-sc...

متن کامل

Detecting Deviations in Text Collections: An Approach Using Conceptual Graphs

Abstract. Deviation detection is an important problem of both data and text mining. In this paper we consider the detection of deviations in a set of texts represented as conceptual graphs. In contrast with statistical and distance-based approaches, the method we propose is based on the concept of generalization and regularity. Among its main characteristics are the detection of rare patterns (...

متن کامل

Implementing Knowledge Interchange for Simulated Entities

This paper describes the techniques, which are being developed and used by Bevilacqua Research Corporation (BRC), to address the cooperative development and reuse of knowledge for intelligent simulations. The work is based on the use of the draft proposed American National Standards (dpANS) Conceptual Graph standard that defines a Conceptual Graph Interchange Format (CGIF). This standard, while...

متن کامل

Data Models for Conceptual Structures

A well-founded data model for Conceptual Structures can help in understanding issues of definitional semantics, efficient implementations and even syntax of proposed languages. This paper presents several useful data models of increasing complexity and applicability that can support Conceptual Structures definitional semantics. The models are presented in Haskell, a non-strict, stronglytyped fu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2012